Programming Homework 3

Submission instructions: Please use your Python notebook for your programming and written answers. You can do that by including “text cells” or “markdown cells” in your Python notebook. You can just type in your answers in these cells as text. At the end, you will have one document (the notebook) with all your answers. To submit, save the notebook as PDF and submit this PDF as your answer. We will also accept other formats for your submission if this does not work for you.

Starter code and required files

In this homework, you will implement and train a neural networks using TensorFlow and Keras to solve a regression problem in high dimension.

Regression in high dimension is in general a very hard problem. Assume for example that you build a uniform Cartesian grid for the unit cube in dimension $d$. You use 10 samples along each direction. The total number of sample points required is then $10^d$, which is quickly intractable even for moderate values of $d$.

For many functions however, although $d$ may be large, the effective dimensionality of the problem may be much smaller. Consider for example a scalar function $f$ and a unit vector $u \in \mathbb R^d$. You can define:

\[g(x) = f(u^T x)\]

Although $g$ takes as input a high dimensional vector $x$ it can still be “learned” because there is only one direction along which $g$ changes, that is $u$. This can be extended to multivariable functions $f$. For example, $f$ may take as input a vector in $\mathbb R^3$, $f(x_1,x_2,x_3)$. In this case, $u$ is replaced by a matrix of dimension $d \times 3$. The columns of $u$ span a subspace called the active subspace. In this case, $g$ only changes when $x$ moves parallel to the active subspace. If it moves in an orthogonal direction, $g$ is constant.

For many problems, although the function $g$ is not exactly of the form given above, it is often the case that there exists a function $f$ and matrix $u$ such that $g(x) \approx f(u^T x)$ is a good approximation.

One of the powerful properties of deep neural networks is that they are able to automatically detect such active subspaces and can provide good models for this class of functions. We will explore some of these ideas in this homework.

We are interested in approximating the following function

\[y = \exp \Big(-\frac{(u^Tx)^2}{d} \Big),\]

using a deep neural network. The input $x \in R^d$ is a high-dimensional vector where $d = 300$ in our case.

Launch high_dim.ipynb to get started. The code for data loading has been provided in the beginning of the notebook.

In this homework, to accelerate the training, we will use the optimizer L-BFGS-B from SciPy to update the neural network, instead of using the TensorFlow ADAM optimizer as we did in previous homework. In our benchmarks, this is significantly faster (by a factor ~ 10x). The code on how to use SciPy optimizer has been provided to you in the beginning of high_dim.ipynb.

Set up your model using Keras as usual, e.g. using

model = tf.keras.Sequential()
...

After setting up your model use:

model.compile(loss=tf.keras.losses.MeanSquaredError())

Then, to train a model with x_train and y_train, run

result = model_fit(model,x_train,y_train,validation_data=(x_val,y_val), epochs=n_epochs)

This replaces the usual Keras command

model.fit(x_train, y_train, epochs=n_epochs)

Check the function signature of model_fit for the other parameters.

  1. Build a neural network and train it with x_train (input) and y_train (output).

Similar to what you have done in Homework 2, explore different neural network configurations (e.g., layer depth and width) and regularization. Make sure the training process converges and the model is not overfitted on the validation set x_val and y_val.

Submission instructions: turn in the code in your notebook to build, compile and train the model. In addition, answer the following questions:

We will inspect how your trained model performs for some special inputs. Points in $\mathbb R^d$ can be decomposed into a component along $u$ and orthogonal to $u$:

\[x_{i,j} = b_i u + a_j u^o_i\]

$b_i$ and $a_j$ are scalars, $u^o_i$ are vectors orthogonal to $u$.

  1. For the values of $x_{i,j}$ given above, discuss the value of the ground-truth output
\[y=\exp \Big(-\frac{(u^T x_{i,j})^2}{d} \Big)\]

as you vary $b_i$ and $a_j$.

We vary $a_j$ to produce different samples along the line of direction $u^o_i$. In the starter code

x_val_otg[i,j,:]

represents sample $j$ ($j=$ 1,…,128) along line $i$ ($i=$ 1,…,4).

  1. For each line (varying j), plot both the ground-truth and predicted (by your trained model) output value $y$ vs the $j$ index. You will get 4 lines, one line per index $i$. In the figure, clearly mark which lines are predictions/ground-truths using a matplotlib legend. Include plots of all lines in one figure. Example code for plotting is provided in the starter code.

Submission instructions: turn in the code in your notebook for plotting, and the plot with the ground-truth and predicted output values $y$ of all 4 lines. In addition, comment on your observations. Do your results make sense? Is this what you expected? How accurate is your prediction?

We now consider a new set of lines, parallel to $u$:

\[x_{i,j} = b_i u^o_i + a_j u\]

The setup is similar to the previous question. These values are stored in

x_val_prl[i,j,:]

  1. For each line, plot both the ground-truth and predicted value $y$. Include plots for all index $i$ in one figure. In the figure, clearly mark which lines are prediction/ground-truths using matplotlib legend.

Submission instructions: turn in the code in your notebook for plotting, and the plot with the ground-truth and predicted output values $y$ of all 4 lines. In addition, comment on your observations.

The training set that was provided corresponds to random points that are normally distributed with standard deviation 1 and mean 0 (see code to initialize x_train). As a result we cannot expect the DNN to be accurate when $x$ is far from the origin.

You can tune the points x_val_prl[i,j,:] by adjusting the last argument in

uniform_distribution(j,n_sample_plot,5)

With this choice $a_j$ ranges from $-5$ to $5$. Change this line

x_val_prl[i,j,:] = bias[i] * dir_otg[:,i] + uniform_distribution(j,n_sample_plot,1) * np.squeeze(u)

to

x_val_prl[i,j,:] = bias[i] * dir_otg[:,i] + uniform_distribution(j,n_sample_plot,5) * np.squeeze(u)

  1. Follow the same instructions as Question 4 and repeat the experiment.

Submission instructions: turn in the code in your notebook for plotting, and the plot with the ground-truth and predicted output values $y$ of all 4 lines. In addition, comment on your observations. For what range of $a_j$ values is the prediction accurate? Where is the error becoming larger? Is this consistent with our choice of points in the training set?